Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Lsn type #64

Merged
merged 8 commits into from
Apr 26, 2021
Merged

add Lsn type #64

merged 8 commits into from
Apr 26, 2021

Conversation

ericseppanen
Copy link
Contributor

@ericseppanen ericseppanen commented Apr 23, 2021

This type is a zero-cost wrapper for a u64, meant to help code communicate with precision what that value means, and prevent any type mix-ups.

It implements Display and Debug. Display will format as 1234ABCD:5678CDEF while Debug will format as Lsn(1234567890).

This isn't ready to merge; it's for code review purposes only.
There are lots of FIXME comments that want feedback.

Fixes #49.

Copy link
Contributor

@knizhnik knizhnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I can not agree that it is "zero-cost wrapper".
Please look at this simple program:

use std::time::Instant;
use std::ops::{Add};

#[derive(Debug, Clone, Copy, Eq, Ord, PartialEq, PartialOrd)]
pub struct Lsn(pub u64);

impl From<u64> for Lsn {
    fn from(n: u64) -> Self {
        Lsn(n)
    }
}

impl Add<u64> for Lsn {
    type Output = Lsn;

    fn add(self, other: u64) -> Self::Output {
        // panic if the addition overflows.
        Lsn(self.0.checked_add(other).unwrap())
    }
}
fn inc_lsn_struct(lsn: Lsn) -> Lsn {
	lsn + 1
}

fn inc_lsn(lsn: u64) -> u64 {
	lsn + 1
}

pub fn main() {
	const ITERATIONS: u64 = 1000000000;
	{
		let now = Instant::now();
		let mut lsn = Lsn(0);
		for _ in 1..ITERATIONS {
			lsn = inc_lsn_struct(lsn);
		}
		println!("Elapsed for LSN {:?} struct {:?}", lsn, now.elapsed());
	}
	{
		let now = Instant::now();
		let mut lsn : u64 = 0;
		for _ in 1..ITERATIONS {
			lsn = inc_lsn(lsn);
		}
		println!("Elapsed for LSN {:?} struct {:?}", lsn, now.elapsed());
	}
}

Results at my notebooks are the following:

knizhnik@xps:~$ rustc -O call.rs
knizhnik@xps:~$ ./call 
Elapsed for LSN Lsn(999999999) struct 27.057152ms
Elapsed for LSN 999999999 struct 39ns
knizhnik@xps:~$ rustc call.rs
knizhnik@xps:~$ ./call 
Elapsed for LSN Lsn(999999999) struct 45.84106434s
Elapsed for LSN 999999999 struct 30.92782111s

It is about 1.5 time differences for non-optimized code and 1000 times for optimized.
I am not sure that optimizer doesn't eliminate calls at all in last case. And speed is still very high. I do not think that we will notice any difference if we use this Lsn representation.

But still it is not possible to say that cost is zero:)

Copy link
Contributor

@hlinnaka hlinnaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me.

impl fmt::Display for Lsn {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
// FIXME: should this be "{:X}/{:08X}" ?
write!(f, "{:X}/{:X}", self.0 >> 32, self.0 & 0xffffffff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snprintf pattern used in postgres is "%X/%X", so "{:X}/{:X}" is in line with that. That said, Postgres input functions do accept the extra 0's. And the extra zeros would be helpful when you're eyeballing different LSN values trying to see which is larger, like 0/10000000 vs 0/1200000.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to the people with more postgres knowledge. Which one would you like?

zenith_utils/src/lsn.rs Outdated Show resolved Hide resolved
zenith_utils/src/lsn.rs Outdated Show resolved Hide resolved
zenith_utils/src/lsn.rs Show resolved Hide resolved
}

impl CacheKey {
pub fn pack(&self, buf: &mut BytesMut) {
self.tag.pack(buf);
buf.put_u64(self.lsn);
buf.put_u64(self.lsn.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is lsn.0 needed, why not just lsn? I thought the impl From<Lsn> for u64 trait would make that work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function takes only a u64:

fn put_u64(&mut self, n: u64);

There are a few ways to get automatic conversions, but none of them happen to be used here:

  • Rust always tries to do "autoref" of the self argument. This is what allows you to type, e.g. myvec.len() instead of (&my_vec).len().
  • Some functions have generic parameters that can handle the coercion, e.g. one could write:
fn put_u64<T>(&mut self, n: T)
where T: Into<u64>

{
    let n: u64 = n.into();
    // do stuff with n
}
  • You can do a similar generic type for references using AsRef or AsMut. You'll see this in the standard library a lot, e.g. File has open<P: AsRef<Path>>(path: P), and there's a long list of std types that implement AsRef<Path.

The question mark operator (for propagating errors) uses Into<___> for whatever your returned error type is. So even if you're not using anyhow or thiserror, implementing Into (or more likely, From) for your error types will allow you to use a variety of error types with ?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that buf.put_u64(self.lsn.0) isn't very friendly.

However, I think that all of this manual byte-pushing code should be replaced with serde anyway. As long as we #[derive(Serialize, Deserialize)] on the Lsn type then it will work exactly the same as u64 with no extra code at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big deal, we can live with it. I was just surprised it didn't "just work".

However, I think that all of this manual byte-pushing code should be replaced with serde anyway. As long as we #[derive(Serialize, Deserialize)] on the Lsn type then it will work exactly the same as u64 with no extra code at all.

That handles this particular case, but I would imagine that this arises in many other places, too. One example off the top of my head: what if you want to store an Lsn in an AtomicU64?

I think this works, too:

    buf.put_u64(self.lsn.into());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found a good crate that allows Atomic<T> where it's implemented as e.g. AtomicU64 if the size is 64-bits. I made one myself, but never quite finished it. Handling the case where T is not Copy gets quite nasty, but we don't need that here, so I could probably dig out the simple version (that requires T = Copy) if we want it.

Or we could just do a quick AtomicLsn wrapper, and wait to see how often we need to cut/paste that code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought: the kind of argument mismatch that put_u64 hits gets less likely as we get to more complex types (that aren't Copy), because they're likely to be passed by reference.

If we have a bunch of functions that take by-reference arguments, e.g.

fn handle_mytype(mt: &MyType);

Then you'll note that the mt argument can be satisfied by any container that Derefs to a MyType. For example, Arc<MyType> or MutexGuard<'_, MyType> both impl Deref with Target=Mytype, so you can write:

handle_mytype(&mt);

... and that code will continue to work whether mt is a MyType or Arc or MutexGuard (or some other container that we build ourselves).

So your instinct that wrapper types should "just work" is still partially true, at least for the bigger data structures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better to add pack/unpack methods to Lsn?
So that it is possible to write:

self.tag.pack(buf);
self.lsn.pack(buf);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better to add pack/unpack methods to Lsn?

Yes, this can be done automatically with #[derive(serde::Serialize, serde::Deserialize)] and then we can use any serde-compatible library we like. The bincode crate seems to implement the raw serialization we need in most places (including endian conversions).

I would need a bit more time to look at the code, but the end result is that we should be able to write things like my_data_structure.serialize(&mut buffer) without ever needing to type put_u64 again.

There is one case where I'm not certain about, and that's when the network packet includes little-endian data that can be copied directly into a Rust struct with no validation. I don't think it's possible to generate that code with serde right now (see also #6), but if the performance is important to us, I think we can build tools to do that.

@ericseppanen
Copy link
Contributor Author

Well, I can not agree that it is "zero-cost wrapper".
Please look at this simple program:

<snip>

It is about 1.5 time differences for non-optimized code and 1000 times for optimized.
I am not sure that optimizer doesn't eliminate calls at all in last case. And speed is still very high. I do not think that we will notice any difference if we use this Lsn representation.

But still it is not possible to say that cost is zero:)

Ah, but you cheated :)

The Lsn version is using checked_add to protect against overflow. If you change the u64 version to also use checked_add you will see identical performance.

It's true that you will see the extra layer of abstraction in a debug build, but it release mode Rust is very good at seeing right through abstractions like this.

Zero-cost abstractions are one of Rust's primary design goals. Rust optimization is a bit more powerful than C or C++, because the type system carries a lot more information. And the Rust optimizer will never break your code the way that often happens to C/C++ programs that didn't know they were relying on some hidden undefined behavior.

@ericseppanen
Copy link
Contributor Author

And one of these days, rust optimization will get even better! There has been a long fight against LLVM bugs, but in March the nightly compiler enabled the "mutable noalias" optimization, which means that code like this can be optimized (because mutable references are unique, these two references are known to not point to the same location):

fn adds(a: &mut i32, b: &mut i32) {
    *a += *b;
    *a += *b;
}

This optimization has been enabled and reverted a few times (because of bugs in LLVM), so we might not see it in a stable release yet. But it will happen eventually.

@knizhnik
Copy link
Contributor

knizhnik commented Apr 24, 2021

Ah, but you cheated :)

Sorry, I didn't expect that just one extra instruction (check overflow bit) may such dramatically affect performance.

@ericseppanen
Copy link
Contributor Author

ericseppanen commented Apr 24, 2021

Updates:

  • rebased the branch
  • replaced sub with sub_checked (53fe4b4)
  • added an AtomicLsn type, to reduce some of the manual conversions (5dcff27)

pageserver/src/page_cache.rs Outdated Show resolved Hide resolved
pageserver/src/waldecoder.rs Outdated Show resolved Hide resolved
if is_xlog_switch_record(&recordbuf) {
trace!(
"saw xlog switch record at {:X}/{:X}",
(self.lsn >> 32),
self.lsn & 0xffffffff
);
self.padlen = (WAL_SEGMENT_SIZE - (self.lsn % WAL_SEGMENT_SIZE)) as u32;
trace!("saw xlog switch record at {}", self.lsn);
self.padlen = (WAL_SEGMENT_SIZE - (self.lsn.0 % WAL_SEGMENT_SIZE)) as u32;
}

if self.lsn % 8 != 0 {
self.padlen = 8 - (self.lsn % 8) as u32;
// FIXME: what does this code do?
if self.lsn.0 % 8 != 0 {
self.padlen = 8 - (self.lsn.0 % 8) as u32;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could someone educate me about what's happening here? It looks like:

  • If "switch record" (what's that?), then compute padding to the end of the segment.
  • If the current LSN isn't a multiple of 8, then compute padding to the next 8-byte boundary, overwriting the padding computed above (?!)

How much of this should be captured into Lsn member functions? We could add something like Lsn::pad_to(chunk_size: u64) that could be used for both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WAL in postgres is splitted into segments. Default size is 16Mb, but unlike XLOGBCLSZ is can be easily changed.
Switching XLOG means start writing to the next segment file.
Usually XLOG seitch record is not needed: WAL segments is filled till the end and then we swutch to next segment. But in some cases we may need to force switching to the new segment although there is still space in the current segment. It may be needed for WAL archiving.

WAL records are aligned on 8 byte boundary. Padding is used to enforce such alignment.
Actually the code fragment can be simplified as:

self.padlen  = (-self.lsn.0  & 7) as u32;

Copy link
Contributor

@hlinnaka hlinnaka Apr 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to see an xlog switch in action, you can force it with select pg_switch_wal();. Can become handy in testing. See https://www.postgresql.org/docs/current/functions-admin.html.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a bug that the code overwrites the "switch record" padding with the 8-byte wal record padding?

zenith_utils/src/lsn.rs Outdated Show resolved Hide resolved
zenith_utils/src/lsn.rs Outdated Show resolved Hide resolved
This type is a zero-cost wrapper for a u64, meant to help code
communicate with precision what that value means.

It implements Display and Debug. Display "{}" will format as
"1234ABCD:5678CDEF" while Debug will format as Lsn{1234567890}.
SeqWait can use any type that is Ord + Debug + Copy. Debug is not
strictly necessary, but allows us to keep the panic message if a caller
wants the sequence number to go backwards.
Use the `Lsn` type everywhere that I can find u64 being used to
represent an LSN.
There is only one place doing subtraction, and it had a manually
implemented check.
AtomicLsn is a wrapper around AtomicU64 that has load() and store()
members that are cheap (on x86, anyway) and can be safely used in any
context.

This commit uses AtomicLsn in the page cache, and fixes up some
downstream code that manually implemented LSN formatting.

There's also a bugfix to the logging in wait_lsn, which prints the
wrong lsn value.
There's no clear way to sum LSNs across timelines, so just remove them
for now.
Replace open-coded math with member fns.
@ericseppanen ericseppanen merged commit 96b6f35 into neondatabase:main Apr 26, 2021
@ericseppanen ericseppanen deleted the lsn_type branch April 26, 2021 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create LSN datatype
3 participants